The Grammar of {ggplot2}

A graphics framework for elegant plotting in R

Cédric Scherer

2019-08-28

class: inverse, center background-image: url(“img/darklight_RichardStrozynski.jpg”) background-size: contain



# An Introduction to {ggplot2}
A graphics framework for elegant plotting in R



### Cédric Scherer
Leibniz Institute for Zoo and Wildlife Research Berlin
IZW Stats Group | 28th of August 2019


###

Image by Richard Strozynski

class: middle

Advantages of {ggplot2}

class: inverse


Figure in Scherer et al. 2019 J. Anim. Ecol.

class: inverse, center, middle



Contribution to #TidyTuesday

class: inverse, center, middle

The Setup

The data

We use data from the National Morbidity and Mortality Air Pollution Study (NMMAPS),
filtered for the city of Chicago and the timespan January 1997 to December 2000.

## Observations: 1,461
## Variables: 10
## $ city     <chr> "chic", "chic", "chic", "chic", "chic", "chic", "chic...
## $ date     <date> 1997-01-01, 1997-01-02, 1997-01-03, 1997-01-04, 1997...
## $ death    <dbl> 137, 123, 127, 146, 102, 127, 116, 118, 148, 121, 110...
## $ temp     <dbl> 36.0, 45.0, 40.0, 51.5, 27.0, 17.0, 16.0, 19.0, 26.0,...
## $ dewpoint <dbl> 37.500, 47.250, 38.000, 45.500, 11.250, 5.750, 7.000,...
## $ pm10     <dbl> 13.052268, 41.948600, 27.041751, 25.072573, 15.343121...
## $ o3       <dbl> 5.659256, 5.525417, 6.288548, 7.537758, 20.760798, 14...
## $ time     <dbl> 3654, 3655, 3656, 3657, 3658, 3659, 3660, 3661, 3662,...
## $ season   <fct> Winter, Winter, Winter, Winter, Winter, Winter, Winte...
## $ year     <fct> 1997, 1997, 1997, 1997, 1997, 1997, 1997, 1997, 1997,...

The Structure of {ggplot2}


  1. Data → The raw data that you want to visualise

  2. Layers geom_ and stat_ → The geometric shapes and statistical summaries representing the data

  3. Aesthetics aes() → Aesthetic mappings of the geometric and statistical objects

  4. Scales scale_ → Maps between the data and the aesthetic dimensions

  5. Coordinate system coord_ → Maps data into the plane of the data rectangle

  6. Facets facet_ → The arrangement of the data into a grid of plots

  7. Visual themes theme() and theme_ → The overall visual defaults of a plot

1. Data: ggplot()

We need to specify data and the two variables we want to plot as aestethics of the ggplot() call:

.pull-left[

There is only an empty panel because
{ggplot2} doesn’t know how it should plot the data. ]

.pull-right[ ]

1. Data: ggplot()

We need to specify data and the two variables we want to plot as aestethics of the ggplot() call:

.pull-left[ Since almost every ggplot() takes the same arguments (data, mapping = aes(x, y)),
we can also write:

… or add the aesthetics outside the ggplot function:

]

.pull-right[ ]

2. Layers: geom_*() and stat_*()

By adding one or multiple layers we can tell {ggplot2} how to represent the data.
There are lots of build-in geometrics elements (geoms) and statistical transformations (stats):

Adapted from https://ggplot2.tidyverse.org/reference/

… and several more in extension packages, e.g. ggforce, ggalt, ggridges, ggrepel, ggcorrplot, ggraph and ggdendro.

2. Layers — Geometries: geom_line()

… or a line plot:

.pull-left[

]

.pull-right[ ]

2. Layers — Geometries: geom_boxplot()

… or a box and whiskers plot:

.pull-left[

]

.pull-right[ ]

2. Layers — Geometries: geom_boxplot()

We need to specify the variable as categorial (year), not as continuous (date):

.pull-left[

]

.pull-right[ ]

2. Layers — Statistical transformations: stat_summary()

A handful of layers with attention to the statistical transformation rather than the visual appearance:

.pull-left[

]

.pull-right[ ]

2. Layers — Statistical transformations: stat_ecdf()

You can also easily plot the empirical cumulative distribution function (ECDF) of a variable:

.pull-left[

]

.pull-right[ ]

2. Layers — Statistical transformations: stat_smooth()

You can specify the fitting method and the formula:

.pull-left[

Other methods such as method = "lm" for linear regressions and method = "glm" for generalized linear models are available as well. ]

.pull-right[ ]

3. Aesthetics: aes()

Aesthetics of the geometric and statistical objects, such as

  • position via x, y, xmin, xmax, ymin, ymax, …

  • colors via color and fill

  • transparency via alpha

  • sizes via size and width

  • shapes via shape and linetype


In general, everything which maps to the data needs to be wrapped in aes()
while static arguments are placed outside the aes().

e.g.
geom_point(aes(color = season)) to color points based on the variable season
geom_point(color = "grey") to color all points in the same color

3. Aesthetics: aes(color/fill/alpha/size/shape)

… and change the color and the shape based on season and year:

.pull-left[

]

.pull-right[]

3. Aesthetics: aes(group)

You can create subsets of the data by specifying a grouping variable via group:

.pull-left[


However, for most applications you can simply specify the grouping using visual aesthetics
(color, fill, alpha, shape, linetype). ]

.pull-right[ ]

class: inverse, center, middle

4. Scales
scale_*()

4. Scales: scale_

One can use scale_*() to change properties of all the aesthetic dimensions mapped to the data.


Consequently, there are scale_*() functions for all aesthetics such as:

  • position via scale_x_*() and scale_y_*()

  • colors via scale_color_*() and scale_fill_*()

  • transparency via scale_alpha_*()

  • sizes via scale_size_*()

  • shapes via scale_shape_*() and scale_linetype_*()

… with extensions (*) such as * continuous(), discrete(), reverse(), log10(), squrt(), date(), time() for axes * continuous(), discrete(), manual(), gradient(), hue(), brewer() for colors and fills * continuous(), discrete(), manual(), ordinal(), identity(), date() for transparencies * continuous(), discrete(), manual(), ordinal(), identity(), area(), date() for sizes * continuous(), discrete(), manual(), ordinal(), identity() for shapes and linetypes

4. Scales: scale_x_*() and scale_y_*()

… and their properties such as the range, scaling, labels, and axis breaks:

.pull-left[

]

.pull-right[ ]

4. Scales: scale_x_*() and scale_y_*()

Some people are annoyed by the extra spacing around the data but we can remove that:

.pull-left[

]

.pull-right[ ]

4. Scales: scale_color_*() and scale_fill_*()

… and change the title of the legend:

.pull-left[

]

.pull-right[ ]

4. Scales: scale_shape_*()

… or remove the legend for a specific aesthetic:

.pull-left[

]

.pull-right[ ]

4. Scales: scale_color_*() and scale_fill_*() — Color Palettes

5. Coordinate System: coord_*()

Coordinate systems combine the two position aesthetics (usually x and y) to produce a 2d position on the plot.

The meaning of the position aesthetics depends on the coordinate system used:

  • Linear coordinate systems that preserve the shape of geoms:

    • coord_cartesian(): the default with two fixed perpendicular oriented axes
    • coord_flip(): a Cartesian coordinate system with flipped axes
    • coord_fixed(): a Cartesian coordinate system with a fixed aspect ratio

  • Non-linear coordinate systems that likely change the shapes:

    • coord_map(): map projections
    • coord_polar(): a polar coordinate system
    • coord_trans(): arbitrary transformations to x and y positions

5. Coordinate System: coord_cartesian()

In case you want to remove those data points, use scale_y_continuous(limits = c(min, max)):

.pull-left[

]

.pull-right[

]

5. Coordinate System: coord_flip()

coord_flip() allows you to flip a Cartesian coordinate system:

.pull-left[

]

.pull-right[ ]

5. Coordinate System: coord_trans()

The difference between transforming the scales and transforming the coordinate system is that coordinate transformation occurs after the statistics:

.pull-left[

]

.pull-right[ ]

5. Coordinate System: coord_map()

Since maps are displaying spherical data, we must project the data via coord_map():

.pull-left[

]

.pull-right[

]

6. Facets: facet_*()

Facetting generates small multiples each showing a different subset of the data:

Adapted from “ggplot2: Elegant Graphics for Data Analysis” by Hadley Wickham

6. Facets: facet_wrap()

facet_wrap() splits the data into small multiples based on one grouping variable:

.pull-left[

It is possible to change the axes range to scale free for each subset (for only one axis to scale free use "free_x" or "free_y". ]

.pull-right[ ]

class: inverse, center, middle

7. Visual Themes
theme() and theme_*()

7. Visual Themes: theme()

To modify the theme of a plot use theme() in combination with element_*():

.pull-left[

]

.pull-right[ ]

7. Visual Themes: theme_*()

Use a built-in theme of {ggplot2}:

.pull-left[

]

.pull-right[ ]

7. Visual Themes: theme_*()

Using the argument base_size() you can change the size of the text:

.pull-left[

]

.pull-right[ ]

7. Visual Themes: theme_set() and theme_update()

theme_set() and theme_update() override settings completely or partly:

.pull-left[

]

.pull-right[ ]

7. Visual Themes: theme_set() and theme_update()

theme_set() and theme_update() override settings completely or partly:

.pull-left[

]

.pull-right[ ]

class: inverse, center, middle

Some more?

class: inverse, center, middle

You want even more?

Working with Text: Title, Subtitle, Caption, and Tag

To quickly add a title, use ggtitle():

.pull-left[

]

.pull-right[ ]

Working with Text: Title, Subtitle, Caption, and Tag

{ggplot2} has a built-in structure for title, subtitle, caption and tags:

.pull-left[

]

.pull-right[ ]

Working with Text: annotate("text")

The annotate() function comes from ggplot2 and is designed to use a so-called grob as input:

.pull-left[

]

.pull-right[ ]

Working with Text: element_markdown() via {ggtext}

With the new {ggtext} package, it is possible to use Markdown and basic HTML within strings:

.pull-left[

]

.pull-right[ ]

Working with Images: annotation_custom(grob)

The annotation_custom() function comes with ggplot2 and is designed to use a so-called grob as input:

.pull-left[

]

.pull-right[ ]

class: inverse, center, middle

Working with Geometric Forms

Working with Geometric Forms: annotate("rect") or geom_rect()?

If you plot many elements, use geom_rect():

.pull-left[

]

.pull-right[ ]

Working with Geometric Forms: geom_vline()

We could also indicate new years by a vertical line:

.pull-left[

There are also geom_abline() and geom_hline(). ]

.pull-right[ ]